Previous lecture

Next Lecture

Syllabus

Homework

 

If you find the idea of audio signals in the time and frequency domains interesting, you may want to look at the lecture notes on modulation, though we probably won't cover this in class.  We will be coming back to this material when we cover the Fourier and Cosine transforms.

Digital Audio

An audio CD has 44,100 samples per second at 16 bits per sample.  Continuing with the example of a 440 Hz music signal, this would be sampled at discrete intervals, quantized, and digitized.

In this example, the 16-bit amplitude quantization divides a range of -1 to 1 into 216 intervals, for a minimum resolution of  1/32768 = 0.00003, which is much less than we could see on this graph.

The time sample and hold is 1/44,100 = 0.0000226 seconds / sample which is evident in the graph.

The process of sampling and quantizing a signal is called pulse code modulation (PCM).


The Nyquist Limit

Does the stair-step effect in the PCM 440 Hz signal affect the sound quality?

How do we know that we can get away with using sampled digital representations of sound?

The sampling theorem:  If the Fourier transform of a signal is always 0 at frequencies higher than W Hz, then it can be accurately represented using 2W equally spaced samples per second.

This can also be stated as: "sample at twice the highest frequency of interest".  The frequency 2 W is called the Nyquist limit.

Since the upper limit of human hearing is 20 kHz, when we sample at 40 kHz we will only lose inaudible high frequencies.  This is why 44.1 kHz was selected as the CD ROM sampling rate.  (Combined with a desire on the part of the music industry to come up with a frequency that was not easily reproduced by existing digital devices.)


Aliasing

I played and recorded a steady Bb4 (474 Hz) on clarinet.  I then digitized the audio tape, sampling 20,000 times per second.  A Fourier transform was applied to the digitized signal.  The results are shown in the figure:


The components of the Fourier transform are:

The highest frequency present is 3789 Hz.  The top of the musical range is at about 4000 Hz.  Therefore, we can sample at 8000 Hz and see all the frequencies up to 4000 Hz.  I recorded a C4 (266 Hz) on bassoon.  I sampled it on a computer that was supposed to do 8KHz sampling.  However, the actual sampling rate turned out to be 4KHz.  After doing a Fourier transform and analyzing the data, I got:

Since the fundamental is 266 Hz, the first 12 harmonics should be: 532, 798, 1064, 1330, 1596, 1862, 2128, 2394, 2660, 2926, and 3192.  We sampled at 4000; the folding frequency is 2000.  Thus the 8th through 12 harmonics show up at 128, 394, 660, 926, and 1192.   In this signal, the 3rd, 8th and 11th harmonics are missing.

 

 

DC means frequency 0.  The DC signal is just the average value.

Say we sample at a signal at frequency fs. The folding frequency is ff = fs/2.  After we take a Fourier transform, we can go to the frequency domain and see which components are present.  For example, let's say we sampled at 8,000 Hz. Suppose that our original signal was a pure sine wave at 7,000 Hz.  We will see this in the frequency domain.  We will also see aliased signals at –7,000 Hz, -1,000 Hz, 1,000 Hz and 9,000 Hz.

Suppose that the original signal had been a 1000 Hz sine wave and we sample at 8,000.  It will look identical to the 7000 Hz signal.  If we go back to the time domain and plot the sampled points, we will find that the 1,000 and 7,000 Hz waves give exactly the same data.  If we take the digitized signal, send it through a digital to analog converter (DAC) and then to a speaker, we will hear 1,000 Hz.

After we have digitized a signal, we cannot undo aliasing.  Nature makes the assumption that only frequencies from DC to ff are present.  In this case, that means 0 to 4000 Hz.

Aliases of sampled signals always exist.  Any signal at frequency f will have aliases at n fs +f and n fsf.  If we obey Nyquist, then we can safely ignore all negative frequencies and all frequencies greater than the folding frequency.  In other words, if we sample at 8,000, we should only expect to see frequencies between 0 and 4,000.

 

Before sampling at twice the frequency of interest, we should apply a low-pass filter to remove all frequencies higher than the folding frequency.  That will prevent the 7,000 Hz signal from showing up as 1,000 Hz.

In images, the physical properties of the photo-detectors in a digital camera perform a low-pass filter.


Frequency Masking

The human ear has a range of 10 octaves with extremely good discrimination and dynamic range.

The basilar membrane of the inner ear plays a key role.  It divides two fluid filled chambers of the cochlea and contacts the hairs on the organ of Corti.  These hairs are the transducers that produce nerve signals to the brain.  The basilar membrane varies in width, thickness, and rigidity along its length.  Different areas of the membrane vibrate at different frequencies.

The basilar membrane has a number of distinct regions (perhaps 24).  Each region can vibrate over a small range of frequencies, but a single region can only vibrate at one frequency at a time.  Thus there are a limited number of frequencies that can be heard at any one time.  Each region of the basilar membrane responds to the strongest frequency within its range and is unaffected by any smaller stimulus.  This effect is called frequency masking. 

MP-3 takes advantage of frequency masking for audio compression.

1) Frequencies close to another but at a lower amplitude can be ignored.

2) A relatively small number of bits can be used to encode the prime stimulus in the band.  If this quantization results in spurious frequencies, the ear will ignore them.